Clusters in Very LargeDatasets with High Dimensions
نویسندگان
چکیده
Multimedia databases typically contain data with very high dimensions. Finding interesting patterns in these databases poses a very challenging problem because of the scalability, lack of domain knowledge and complex structures of the embedded clusters. High dimensionality adds severely to the scalability problem. It has been shown that the wavelet-based clustering technique, WaveCluster, is very eecient and eeective in detecting arbitrary shape clusters and eliminating noisy data for low dimensional data. In this paper, we introduce HiperWave, an approach to applying wavelet-based techniques for clustering high dimensional data. Using a hash-based data structure, our approach makes intelligent use of available resources to discover clusters in the dataset. We demonstrate that the cost of clustering can be reduced dramatically yet maintaining all the advantages of wavelet-based clustering. This hash-based data representation can be applied for any grid-based clustering approaches. Finally, we introduce a quantitative metric to measure the quality of the resulting clusters. The experimental results show both eeectiveness and eeciency of our method on high dimensional datasets.
منابع مشابه
A Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data
Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...
متن کاملA Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data
Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...
متن کاملEvaluating Subspace Clustering Algorithms
Clustering techniques often define the similarity between instances using distance measures over the various dimensions of the data [12, 14]. Subspace clustering is an extension of traditional clustering that seeks to find clusters in different subspaces within a dataset. Traditional clustering algorithms consider all of the dimensions of an input dataset in an attempt to learn as much as possi...
متن کاملSpatial assessment and analysis of endogenous development zones in region 3 of Isfahan metropolis
Aims and Background: Following the rapid population growth and the emergence of new urbanization developments in the world, different and sometimes contradictory solutions have been proposed for the development of cities. We see these solutions in development patterns such as surface dispersion, jump development, etc., which impose infrastructure and environmental costs on the city. This proces...
متن کاملSpontaneous Adsorption, and Selective Sensing of CO, and CO2 Greenhouse Gaseous Species by the more Stable Forms of N4B4 Clusters
Carbon oxide gaseous species are potentially considered as pollutants of the atmosphere of earth; especially, carbon monoxide and carbon dioxide which are of the well-known carbon oxids, play an effective role in the greenhouse gas emissions. Moreover, these species could initiate or handle some chain reactions in the troposphere that lead to emergence of some secondly air pollutants which may ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998